13 research outputs found

    A Comparative Study of Simple Online Learning Strategies for Streaming Data

    Get PDF
    Since several years ago, the analysis of data streams has attracted considerably the attention in various research fields, such as databases systems and data mining. The continuous increase in volume of data and the high speed that they arrive to the systems challenge the computing systems to store, process and transmit. Furthermore, it has caused the development of new online learning strategies capable to predict the behavior of the streaming data. This paper compares three very simple learning methods applied to static data streams when we use the 1-Nearest Neighbor classifier, a linear discriminant, a quadratic classifier, a decision tree, and the Na¨ıve Bayes classifier. The three strategies have been taken from the literature. One of them includes a time-weighted strategy to remove obsolete objects from the reference set. The experiments were carried out on twelve real data sets. The aim of this experimental study is to establish the most suitable online learning model according to the performance of each classifie

    One-Sided Prototype Selection on Class Imbalanced Dissimilarity Matrices

    Get PDF
    In the dissimilarity representation paradigm, several prototype selection methods have been used to cope with the topic of how to select a small representation set for generating a low-dimensional dissimilarity space. In addition, these methods have also been used to reduce the size of the dissimilarity matrix. However, these approaches assume a relatively balanced class distribution, which is grossly violated in many real-life problems. Often, the ratios of prior probabilities between classes are extremely skewed. In this paper, we study the use of renowned prototype selection methods adapted to the case of learning from an imbalanced dissimilarity matrix. More specifically, we propose the use of these methods to under-sample the majority class in the dissimilarity space. The experimental results demonstrate that the one-sided selection strategy performs better than the classical prototype selection methods applied over all classes

    Exploring early classification strategies of streaming data with delayed attributes

    Get PDF
    In contrast to traditional machine learning algorithms, where all data are available in batch mode, the new paradigm of streaming data poses additional difficulties, since data samples arrive in a sequence and many hard decisions have to be made on-line. The problem addressed here consists of classifying streaming data which not only are unlabeled, but also have a number l of attributes arriving after some time delay. In this context, the main issues are what to do when the unlabeled incomplete samples and, later on, their missing attributes arrive; when and how to classify these incoming samples; and when and how to update the training set. Three different strategies (for l = 1 and constant) are explored and evaluated in terms of the accumulated classification error. The results reveal that the proposed on-line strategies, despite their simplicity, may outperform classifiers using only the original, labeled-and-complete samples as a fixed training set. In other words, learning is possible by properly tapping into the unlabeled, incomplete samples, and their delayed attributes. The many research issues identified include a better understanding of the link between the inherent properties of the data set and the design of the most suitable on-line classification strateg

    On-line learning from streaming data with delayed attributes: A comparison of classifiers and strategies

    Get PDF
    In many real applications, data are not all available at the same time, or it is not affordable to process them all in a batch process, but rather, instances arrive sequentially in a stream. The scenario of streaming data introduces new challenges to the machine learning community, since difficult decisions have to be made. The problem addressed in this paper is that of classifying incoming instances for which one attribute arrives only after a given delay. In this formulation, many open issues arise, such as how to classify the incomplete instance, whether to wait for the delayed attribute before performing any classification, or when and how to update a reference set. Three different strategies are proposed which address these issues differently. Orthogonally to these strategies, three classifiers of different characteristics are used. Keeping on-line learning strategies independent of the classifiers facilitates system design and contrasts with the common alternative of carefully crafting an ad hoc classifier. To assess how good learning is under these different strategies and classifiers, they are compared using learning curves and final classification errors for fifteen data sets. Results indicate that learning in this stringent context of streaming data and delayed attributes can successfully take place even with simple on-line strategies. Furthermore, active strategies behave generally better than more conservative passive ones. Regarding the classifiers, it was found that simple instance-based classifiers such as the well-known nearest neighbor may outperform more elaborate classifiers such as the support vector machines, especially if some measure of classification confidence is considered in the process.This work has been supported in part by the Spanish Ministry of Education and Science under grants CSD2007-00018 Consolider Ingenio 2010 and TIN2009-14205, and by Fundació Caixa Castelló—Bancaixa under grant P1-1B2009-04

    RACE (Rapid Arterial oCclusion Evaluation) : Diseño y validación prehospitalaria de una escala neurológica para la predicción de una oclusión arterial proximal en los pacientes con un ictus isquémico agudo de la circulación cerebral anterior

    Get PDF
    La Rapid Arterial oCclusion Evaluation és una escala neurològica prehospitalària que prediu la presència d'una oclusió arterial proximal (OAP) en els pacients amb un ictus isquèmic agut de la circulació cerebral anterior (IIACCA). Fou dissenyada valorant retrospectivament a 654 pacients amb un IIACCA, seleccionant la combinació dels ítems de la National Institutes of Health Stroke Scale que mostraven una major associació amb la presència d'una OAP: parèsia facial, parèsia braquial, parèsia crural, desviació oculocefàlica y agnòsia/afàsia. Fou validada valorant prospectivament a 93 activacions del Codi Ictus, mostrant una sensibilitat del 88% y una especificitat del 65% per una puntuació ≥ 4.La Rapid Arterial oCclusion Evaluation es una escala neurológica prehospitalaria que predice la presencia de una oclusión arterial proximal (OAP) en los pacientes con un ictus isquémico agudo de la circulación cerebral anterior (IIACCA). Se diseñó valorando retrospectivamente a 654 pacientes con un IIACCA, seleccionando la combi-nación de los ítems de la National Institutes of Health Stroke Scale que mostraba una mayor asociación con la presencia de una OAP: paresia facial, paresia braquial, paresia crural, desviación oculocefálica y agnosia/afasia. Se validó valorando prospectivamente a 93 activaciones del Código Ictus, mostrando una sensibilidad del 88% y una especifi-cidad del 65% para una puntuación ≥ 4

    Machine learning methods to forecast temperature in buildings

    No full text
    Efficient management of energy in buildings saves a very important amount of resources (both economic and technological). As a consequence, there is a very active research in this field. One of the keys of energy management is the prediction of the variables that directly affect building energy consumption and personal comfort. Among these variables, one can highlight the temperature in each room of a building. In this work we apply different machine learning techniques along with other classical ones for predicting the temperatures in different rooms. The obtained results demonstrate the validity of these techniques for predicting temperatures and, therefore, for the establishment of optimal policies of energy consumption

    RACE (Rapid Arterial oCclusion Evaluation) : Diseño y validación prehospitalaria de una escala neurológica para la predicción de una oclusión arterial proximal en los pacientes con un ictus isquémico agudo de la circulación cerebral anterior

    No full text
    La Rapid Arterial oCclusion Evaluation és una escala neurològica prehospitalària que prediu la presència d'una oclusió arterial proximal (OAP) en els pacients amb un ictus isquèmic agut de la circulació cerebral anterior (IIACCA). Fou dissenyada valorant retrospectivament a 654 pacients amb un IIACCA, seleccionant la combinació dels ítems de la National Institutes of Health Stroke Scale que mostraven una major associació amb la presència d'una OAP: parèsia facial, parèsia braquial, parèsia crural, desviació oculocefàlica y agnòsia/afàsia. Fou validada valorant prospectivament a 93 activacions del Codi Ictus, mostrant una sensibilitat del 88% y una especificitat del 65% per una puntuació ≥ 4.La Rapid Arterial oCclusion Evaluation es una escala neurológica prehospitalaria que predice la presencia de una oclusión arterial proximal (OAP) en los pacientes con un ictus isquémico agudo de la circulación cerebral anterior (IIACCA). Se diseñó valorando retrospectivamente a 654 pacientes con un IIACCA, seleccionando la combi-nación de los ítems de la National Institutes of Health Stroke Scale que mostraba una mayor asociación con la presencia de una OAP: paresia facial, paresia braquial, paresia crural, desviación oculocefálica y agnosia/afasia. Se validó valorando prospectivamente a 93 activaciones del Código Ictus, mostrando una sensibilidad del 88% y una especifi-cidad del 65% para una puntuación ≥ 4

    Early detection of mechanical damage in mango using NIR hyperspectral images and machine learning

    No full text
    Mango fruit are sensitive and can easily develop brown spots after suffering mechanical stress during postharvest handling, transport and marketing. The manual inspection of this fruit used today cannot detect the damage in very early stages of maturity and to date no automatic tool capable of such detection has been developed, since current systems based on machine vision only detect very visible damage. The application of hyperspectral imaging to the postharvest quality inspection of fruit is relatively recent and research is still underway to find a method of estimating internal properties or detecting invisible damage. This work describes a new system to evaluate mechanically induced damage in the pericarp of ‘Manila’ mangos at different stages of ripeness based on the analysis of hyperspectral images. Images of damaged and intact areas of mangos were acquired in the range 650–1100 nm using a hyperspectral computer vision system and then analysed to select the most discriminating wavelengths for distinguishing and classifying the two zones. Eleven feature-selection methods were used and compared to determine the wavelengths, while another five classification methods were used to segment the resulting multispectral images and classify the skin of the mangos as sound or damaged. A 97.9% rate of correct classification of pixels was achieved on the third day after the damage had been caused using k-Nearest Neighbours and the whole spectra and the figure dropped to 91.4% when only the most discriminant bands were used
    corecore